Non-Disjoint Discretization for Aggregating One-Dependence Estimator Classifiers

نویسندگان

  • Ana M. Martínez
  • Geoffrey I. Webb
  • M. Julia Flores
  • José A. Gámez
چکیده

There is still lack of clarity about the best manner in which to handle numeric attributes when applying Bayesian network classifiers. Discretization methods entail an unavoidable loss of information. Nonetheless, a number of studies have shown that appropriate discretization can outperform straightforward use of common, but often unrealistic parametric distribution (e.g. Gaussian). Previous studies have shown the Averaged One-Dependence Estimators (AODE) classifier and its variant Hybrid AODE (HAODE, which deals with numeric and discrete variables) to be robust towards the discretization method applied. However, all the discretization techniques taken into account so far formed nonoverlapping intervals for a numeric attribute. We argue that the idea of non-disjoint discretization, already justified in Naive Bayes classifiers, can also be profitably extended to AODE and HAODE, albeit with some variations; and our experimental results seem to support this hypothesis, specially for the latter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Disjoint Discretization for Naive-Bayes Classifiers

Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of an interval to obtain more r...

متن کامل

Creating Ensembles of Classifiers

Ensembles of classifiers offer promise in increasing overall classification accuracy. The availability of extremely large datasets has opened avenues for application of distributed and/or parallel learning to efficiently learn models of them. In this paper, distributed learning is done by training classifiers on disjoint subsets of the data. We examine a random partitioning method to create dis...

متن کامل

A NEURO-FUZZY GRAPHIC OBJECT CLASSIFIER WITH MODIFIED DISTANCE MEASURE ESTIMATOR

The paper analyses issues leading to errors in graphic object classifiers. Thedistance measures suggested in literature and used as a basis in traditional, fuzzy, andNeuro-Fuzzy classifiers are found to be not suitable for classification of non-stylized orfuzzy objects in which the features of classes are much more difficult to recognize becauseof significant uncertainties in their location and...

متن کامل

A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)

Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...

متن کامل

Technical Report No: BU-CE-1001 A Discretization Method based on Maximizing the Area Under ROC Curve

We present a new discretization method based on Area under ROC Curve (AUC) measure. Maximum Area under ROC Curve Based Discretization (MAD) is a global, static and supervised discretization method. It discretizes a continuous feature in a way that the AUC based only on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as Entropy-MDLP (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012